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r-j ■ A new algorithm designed to reduce the model dependence in future SUSY 

Qh! searches at the LHC is described. This algorithm can dynamically adapt it- 

Q H 1 self to a wide range of possible SUSY final states thus reducing the need for 

detailed model-driven analysis. Preliminary study of its performance on sim- 
ulated MSSM, GMSB and AMSB final states is described, and a comparison 
with traditional search procedures, whenever available, is performed. 
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1 Introduction 



While the case for nature to be supersymmetric is very appealing, the understanding 
of the way in which Supersymmetry is broken is far from being established. The 
details of this symmetry breaking determine the SUSY mass spectrum and conse- 
quently the way in which SUSY will exhibit its existence at the LHC An attempt to 
perform a virtually model independent search for a large class of possible SUSY final 
states is reported in this note. The outlines of the proposed wide-scope algorithm are 
presented in the next section. The widening of the scope of the search is achieved 
by dynamic adaptation of the algorithm to the peculiarities of the signal. Such a 
procedure is likely to result in a reduction of the search sensitivity when compared 
to sophisticated dedicated analysis techniques like Artificial Neural Networks (ANN), 
which are based on a prior knowledge of the signal characteristics but the deterioration 
is shown to be marginal and the algorithm performs significantly better than simple 
cuts. It is argued that a combination of the traditional Model driven searches and the 
present wide scope procedure will allow ATLAS to conduct the most effective search 
for SUSY (and probably other) final states. These statements are substantiated by 
the MC studies that are described in the following sections. 

2 Description of the technique 

The exact nature of the expected SUSY final states depends on the details of the way 
SUSY is broken and is yet unknown. Be it as it may, one has some general hints for 
the nature of such final states: 

• Very high mass: since SUSY particles must be heavy (Tevatron, LEP); 

• Large missing energy: at least in all RPC models due to the existence of a 
neutral practically non-interacting LSP. 

An attempt to construct a search procedure in the most general way possible, based 
on these hints, is described in this note. 

2.1 The LSL algorithm 

The K-neighborhood algorithm pQ was modified in such a way that it can cope with 
the task of finding small deviations from the simulated expectations, which might 
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result from the presence of an unspecified signal. In the modified algorithm - the 
LSL (Local Spherical Likelihood) |2j - each event is described by N parameters and 
is represented by a point in a corresponding N-dimensional space, where the N axes 
correspond to the N parameters. The generic name for such a space is the 'event- 
space 1 . The choice of parameters (i.e. axes) is crucial as it determines to which type 
of signal the analysis will be sensitive. This is the place where model dependence is 
introduced into the procedure. Once the parameters (i.e. the axes) have been chosen 
one normalizes them (usually the parameters are mapped in such a way that they are 
centered at zero and distributed between zero and one) in order to remove the effect 
of variable scaling. 

Next, one runs a simulation of all the relevant SM processes and places each of 
the simulated events in an event-space, which is named the 1 reference 1 space. One 
proceeds then by constructing a similar event-space using all data events, this event- 
space is named the ' data 1 space. 

The essence of the algorithm is to look for local accumulations of events in the 1 data 1 
space, which are absent in the Reference 1 space. 3 In the LSL, in order to expose the 
existence of such a local high-density region each of the data events is placed inside 
the Reference 1 space and an N-dimensional sphere is traced around it. The radius 
of this sphere is adjusted in such a way that it is the minimum that is required to 
contain exactly N B reference events, where N B is a predefined number. The radius 
of this sphere is then recorded. Next a sphere with the same radius is traced around 
the same data event but this time this is done in the 1 data 1 space and the number of 
data events that are contained in the sphere No is determined. 
In the absence of signal one expects No « N B - If signal were present one would 
expect N D > N B . 

In order to discern the presence of signal the quantity: 



is computed. A large value of p, which is the parameter that quantifies the local 
deviation of the density of the data from the density of the background, is therefore, 
an indication for a possible existence of a signal. 

The numerical value above which p can be considered large enough to constitute an 

3 Another algorithm which is designed for the same task is the Sleuth one which has been developed 
and used at the Tevatron j^j. The present algorithm is conceptually simpler. 




N D -N B 



2 



evidence for the presence of a signal in the data is not well defined at this stage. In 
order to estimate this value one makes use of additional SM simulated events (which 
are not used for the construction of the 'reference' space) and construct a 'nuW space, 
namely, data-like event-space in which instead of data one places SM simulated events 
(without a signal). One can then repeat the procedure outlined above for the 'null' 
space and get the p distribution for the no-signal hypothesis. The actual value of p as 
computed from the data, can now be compared with the null-hypothesis and acquire 
a meaningful statistical interpretation. 

Figure [TJd shows the p (for fixed Nb = 21) distribution for the signal case (upper red 
histogram) and for a background case (lower blue histogram). The peak at p « 13 
in the signal case is an artifact of the situation in which the n-dimensional sphere is 
located at the center of a well separated cluster of signal events. The sphere contains 
all the signal in the cluster and the radius is then artificially enlarged to include the 
required 21 background events. Thus, the spheres around different data points inside 
this cluster contain the same set of background and, consequently, signal events. As 
a result the p value of all the events in this cluster is roughly the same. 

The size of the sphere, namely the numerical value of Nb depends on the number of 
simulated events as well as on the shape that the signal cloud takes inside the event 
space. Since the second factor is unknown the procedure is repeated for values of N B 
ranging from some minimal value N (21 in the present study) to some fixed value, 
say 5% of the number of background events in the reference space. The maximal 
attainable p in the series of p(Ns), is denoted by p max . 

As mentioned above a large value of p max is a strong indication for the existence of a 
signal in the data. 

The variation of p as a function of Nb is shown in Figure for typical background 
and signal event. Since p(Nb) is strongly correlated with p(Nb — 1) the maximum 
value of p as obtained by this procedure is fairly stable. It is disadvantageous to 
evaluate p at low values of Nb since for such values the statistical error is large. It is 
equally disadvantageous to evaluate p at high values of Nb since then the radius of 
the sphere is large and one looses the locality nature of the analysis. 

At that point one can select a fixed Nb for which the attainable p are large or continue 
with p max at the cost of having a variable Nb- The results which are presented below 
have been obtained using the best p ma x(NB) attainable provided Nb > 20. The 
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Figure 1: a) The variation of p as a function of the size of the sphere (Nb) for a 
typical background (black line, lower band) and signal (red line, upper band) events, 
b) p max distribution for the signal case (upper red solid histogram) and for a simulated 
background, (lower black histogram). Note the long tail of high p events in the signal 
plot. The small peak at large p max in the upper plot is discussed in the text. 
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dependence of sensitivity of the analysis on Nb is shown in Figure 121 for the case 
of GMSB with A = 170 TeV, M = 1000 TeV, tan/3 = 15 and a positive fi. One 
sees, that at small Nb the performance is not very good because of the statistical 
fluctuations, while for large Nb it decreases because of the limited number of signal 
events. The optimal Nb in this case is found to be at about 60. 

The whole sequence of steps is summarized in the following list: 

1. Choose the parameters (motivated by physics considerations); 

2. Apply a set of soft preliminary cuts (to remove irrelevant events); 

3. Scale and normalize the events' parameters; 

4. Form a 'reference' space from all relevant SM background processes; 

5. Form some 'nulV space by simulating additional SM simulated events; 

6. Apply the procedure that was described before, for obtaining the p distributions 
to the 'reference' space and the 'null' space and obtain the distribution for p™£u- 
The number of events in the 'null' space should be as large as possible 4 ; 

7. Form the 'data' space using preselected data events; 

8. Apply the procedure that was described before, for obtaining the p distributions 
to the 'reference' space and 'data' space and obtain the distribution of the data 

•max . 
rdata ) 

9. Compute er(p) = N <tata-N„ u u ^ >cut ■ w here N stands for the number of events 
with p > cut 5 , and maximize this value by changing the value of p cut . 



3 Implementation 

While the algorithm which was described above can be used to search for any con- 
ceivable signal we study its performance here by applying it on various RPC SUSY 

4 In order to speed up the calculation, the null space was split to several smaller subspaces that 
are equal in size to the data space. The LSL algorithm is then applied to each of these subspaces 
separately, and the average p max is used. 

5 The simplest possible statistical approach is taken here for simplicity sake. Obviously, if the 
numbers involved are small a poisson distribution would be more adequate 
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Figure 2: The statistical significance of the MSSM search analysis as a function of 
N B . 
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simulated signals. This decision determines, as was discussed, the selection of pa- 
rameters by which each event is described. On one hand one would like to have all 
the relevant parameters that one can think of, but on the other hand a large number 
of parameters will necessitates a huge number of simulated events and will make the 
procedure either slow or useless. Hence, only 4 input parameters, with the highest 
'separation' power, were selected. In order to conform with existing analyzes two sets 
are used: one in which no requirement on the presence of leptons in the event is set; 
and another one in which one lepton is required and its properties are included in the 
input parameters. The parameters for the 'no-lepton' case were: 

• E™ lss - Where E r t mss is the missing transverse energy of the event; 

• pi etl - Where P^ etl is the transverse momentum of the most energetic (trans- 
verse direction) jet; 

• pJ e *a _ \yhere F/ e * 2 is the transverse momentum of the second most energetic 
(transverse direction) jet; 

• EE t - Where HE t is total transverse energy of the event. 
In the case of 1 — lepton channel the 4 input variables are: 

• E™ %ss - Where E™ lss is the missing transverse energy of the event; 

• pi etl _ Where Pf Ctl is the transverse momentum of the most energetic (trans- 
verse direction) jet; 

• M tt i_ miss - Where M t j_ miss is the transverse mass of the lepton-missing momen- 
tum system; 

• HE t - Where T,E t is total transverse energy of the event. 

The SM processes that have been simulated (using Pythia) for this study consist of 
the processes: pp — > WX; pp — > ZX; pp — > tt; pp — > two jets. The Equivalent 
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luminosity was set to 10 fb" 1 and in order to keep the number of events reasonable, 
a pt cut of 200 GeV (via ckin(3) [3]) was applied. The effect of this cut was checked 
later and verified to be of negligible importance. 

The signal was simulated using Pythia [4J (for MSSM) and ISAJET (for GMSB 
and AMSB). The detector response was simulated using a fast simulation program 6 . 
The ATLAS TDR [H] as well as some additional points were used in this study. 

In order to reduce the number of background events in the various event spaces, a set 
of preliminary cuts was applied. 

• £™ ss > 500 GeV: which is due to the presence of two LSP in each event; 

• Pj> etl > 200 GeV: this cut and the two that follow reflect the high mass of the 
expected SUSY particles; 

• Pi et2 > 100 GeV; 

• T,E t > 1500 GeV; 

• Nj et > 3: this cut and the one that follows are based on the fact that SUSY 
events are expected to give rise to long cascade decay chains ; 

• C > 0.1, Where C is the Circularity of the event. 

for 1 — lepton analysis the presence of a lepton with p t > 10 GeV and \rj\ < 2.5 allows 
softening some of the cuts. The preliminary cuts were therefore: 

• E t miss > 200 GeV; 

• N jet > 3; 

• Pl etl > 100 GeV; 

• P t jet2 > 50 GeV; 

• T.Et > 200 GeV; 

• M^i-miss > 80 GeV: this cut removes most of the W + jet background. 

Since the main goal of the present study is the investigation of the performance of the 
LSL algorithm no attempt to look for optimal preselection cuts was done. Rather, 
the quantities that were used in jS] are used. 

6 The Fortran version of the ATLAS fast simulation program (ATLFAST) version 2.53 
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4 Results 



The sensitivity of ATLAS to predicted signals of several RPC SUSY models was 
estimated using the LSL algorithm. In the case of MSSM and AMSB, it was possible 
to compare the LSL sensitivity to conventional procedure. Recently, a comprehensive 
evaluation of ATLAS's sensitivity to a MSSM signal was performed [Ej. On top of 
introducing a new channel, namely, the missing energy channel with no requirement 
on leptons, which proved to be the best search channel, this study also introduced 
a sophisticated automatic cut optimization procedure which is based on the Simplex 
algorithm. Figure El is a comparison between this technique in which the signal is 
simulated and cuts are optimized in numerous points and the LSL algorithm in which 
no simulation of the signal was used at all. 

Figure is a similar comparison between the two methods when only events with 
no leptons are considered. A somewhat complementary case, namely the case when 
events are required to have one isolated energetic lepton is shown in Figure |3Jd. An 
attempt to combine these two searches was also carried out. Such a combination, 
which is similar to the one applied in Higgs boson searches at LEP, is expected to 
lead to an improved sensitivity. However, the improvement which was obtained was 
only marginal. 

Generally speaking one may conclude from these plots that the sensitivity of the two 
methods is comparable. Yet one should bare in mind that the LSL algorithm did 
not make any use of simulated signal. For completeness sake the sensitivity of 
ATLAS for a MSSM signal as estimated with the LSL algorithm for luminosities of 
1, 10 and 100 fb -1 is presented in Figure El 

A Study of ATLAS sensitivity to possible AMSB signal was carried out by Barr, 
Allanach, Lester and Parker !9J. In order to extract a signal a set of quantities were 
selected and were subjected to various cuts. 10 sets of such cuts were used for the 
various analyzes that have been done: 0-lepton, 1-lepton, 2-oppositely charged lepton 
etc.. In order to compare the LSL performance with this analysis while keeping the 
wide-scope approach, the null and reference spaces that were used in the MSSM 
case were used also here. No modification whatsoever was introduced except for the 
introduction of a simulated AMSB signal into the data space instead of the MSSM 
one. The comparison of the E miss analyzes is shown in figure El 

The two analyzes are again comparable except for the right side of Figure El where the 
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Figure 3: The sensitivity reach of ATLAS to MSSM signal in the missing energy 
channel with no requirements set on the number of leptons in the event. The solid 
(black) line is from [Hj and the dashed (red) one is the result of the LSL algorithm. 
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Figure 4: The sensitivity reach of ATLAS to MSSM signal in the missing energy 
channel with no leptons in the event (a) and with one leptons in the event (b). The 
solid (black) line is from 8J and the dashed (red) one is the result of the LSL 
algorithm. 
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Figure 5: The sensitivity reach of ATLAS to MSSM signal as estimated using the 
LSL algorithm for luminosities of 1, 10 and 100 fb^ 1 . 
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Figure 6: A comparison between the LSL sensitivity and the published results of 
the search for AMSB signal. The blue circled area represents the estimated sensitivity 
of the dedicated search while the thick dotted black line is the LSL sensitivity limit. 
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LSL performance is inferior to the conventional technique. This behavior is related to 
the number of events with large number of jets and the differences in their simulation 
between Herwig (used by and Pythia (background simulation in LSL case). Note 
that the sensitivity region here is estimated by -j= > 5 and S > 10. 

For completeness a three-luminosity contour, with 1, 10 and 100 fb~ l is also given 
when the sensitivity is estimated with a more stable estimator, namely requiring 
^/g +B > 5 where S and B are the number of signal and background events respectively. 

A similar procedure was repeated for the GMSB case. The LSL inputs were left 
unchanged and the estimated ATLAS sensitivity is shown in Figure |S1 It is found 
again to be comparable to the one which was obtain with a naive set of conventional 
cuts mg. 



The LSL is basically looking for deviations of the data from the SM expectation, 
as predicted by the simulation. As such it might be sensitive to the quality of the 
simulation. Defects in the simulation can easily be misinterpreted as indication of 
a signal. Some preliminary studies of the stability of the algorithm under artificial 
distortion of the simulation are described in Appendix A. 



5 Conclusion 

The LSL sensitivity was shown to be comparable to the one attainable by carefully 
adjusting the cuts to a signal of well-known characteristics. It is possible that a more 
sophisticated analysis, which is based on likelihood or artificial neural networks, will 
be superior to the LSL algorithm. Yet one should bare in mind that the LSL 
algorithm did not make any use of simulated signal. Hence, the LSL will be 
able to observe signals of unpredicted nature and once such deviation are exposed; 
they will be studied using all available analysis tools. 
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Figure 7: The sensitivity reach of ATLAS in the AMSB parameter space for lumi- 
nosities of 1, 10 and 100 fb^ 1 . 
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Appendix A 



Differences between the data and the simulated signals trigger the LSL algorithm. 
Such differences may indicate the presence of a signal but might result also from bad 
modelling of the detector and/or from bad modelling of the various SM processes. 
In order to evaluate the effect of the later sources few preliminary studies have been 
done. The first test checked the sensitivity to energy calibration. The energy of the 
'measured' events (i.e. those in the data space) was scaled down by 5% while that 
of the simulated SM (the reference space) was left untouched. The efficiency /purity 
of the signal selection procedure for a MSSM signal under these conditions was com- 
pared to the one which was obtained under normal conditions. The results are shown 
in Figure 

Another potential source of fake signal is mismodelling of SM processes. In order to 
evaluate the importance of this source of trouble the tt process was scaled down by 
10% in the reference space, leaving the 'data' richer in tt by 10% more that 'predicted. 
The efficiency vs purity performance curve of the LSL is shown in Figure ^jp. 

One may conclude from these two tests that the algorithm is fairly stable to the tested 
forms of distortion. 
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